Association of early menarche with breast tumor molecular features and recurrence

Background Early menarche is an established risk factor for breast cancer but its molecular contribution to tumor biology and prognosis remains unclear. Methods We profiled transcriptome-wide gene expression in breast tumors (N = 846) and tumor-adjacent normal tissues (N = 666) from women in the Nurses’ Health Studies (NHS) to investigate whether early menarche (age < 12) is associated with tumor molecular and prognostic features in women with breast cancer. Multivariable linear regression and pathway analyses using competitive gene set enrichment analysis were conducted in both tumor and adjacent-normal tissue and externally validated in TCGA (N = 116). Subgroup analyses stratified on ER-status based on the tumor were also performed. PAM50 signatures were used for tumor molecular subtyping and to generate proliferation and risk of recurrence scores. We created a gene expression score using LASSO regression to capture early menarche based on 28 genes from FDR-significant pathways in breast tumor tissue in NHS and tested its association with 10-year disease-free survival in both NHS (N = 836) and METABRIC (N = 952). Results Early menarche was significantly associated with 369 individual genes in adjacent-normal tissues implicated in extracellular matrix, cell adhesion, and invasion (FDR ≤ 0.1). Early menarche was associated with upregulation of cancer hallmark pathways (18 significant pathways in tumor, 23 in tumor-adjacent normal, FDR ≤ 0.1) related to proliferation (e.g. Myc, PI3K/AKT/mTOR, cell cycle), oxidative stress (e.g. oxidative phosphorylation, unfolded protein response), and inflammation (e.g. pro-inflammatory cytokines IFN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α and IFN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document}γ). Replication in TCGA confirmed these trends. Early menarche was associated with significantly higher PAM50 proliferation scores (β = 0.082 [0.02–0.14]), odds of aggressive molecular tumor subtypes (basal-like, OR = 1.84 [1.18–2.85] and HER2-enriched, OR = 2.32 [1.46–3.69]), and PAM50 risk of recurrence score (β = 4.81 [1.71–7.92]). Our NHS-derived early menarche gene expression signature was significantly associated with worse 10-year disease-free survival in METABRIC (N = 952, HR = 1.58 [1.10–2.25]). Conclusions Early menarche is associated with more aggressive molecular tumor characteristics and its gene expression signature within tumors is associated with worse 10-year disease-free survival among women with breast cancer. As the age of onset of menarche continues to decline, understanding its relationship to breast tumor characteristics and prognosis may lead to novel secondary prevention strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-024-01839-0.


Introduction
Early menarche, the onset of the menstrual cycle at an early age (< 12 years old, the median age at menarche in the United States), is consistently associated with increased breast cancer risk [1,2].In large, pooled studies and meta-analyses, each year of younger onset of menarche was associated with a 5-9% increased risk of breast cancer [1][2][3].The increase in breast cancer risk due to lengthening of reproductive cycling is thought to arise from higher levels and longer exposure time to estrogens [1,2], which are known mitogens and drive a number of cancers [4].However, no prior study to our knowledge has comprehensively investigated tumor molecular characteristics associated with early menarche.Further, the impact of early menarche on breast cancer prognosis also remains largely under-studied.Data from the National Health and Nutrition Examination Survey (NHANES) reveal that average ages of menarche declined by as much as 11 months between 1920 and 1980, with non-Hispanic Black women showing the largest decline in mean age of menarche (14 months) [5].As the decreasing secular trend of age of onset of menarche continues [6], understanding the underlying mechanisms through which early menarche is associated with tumorigenesis could lead to novel prevention strategies.However, there is a general paucity of data linking early life exposures, such as age of menarche, with molecular tumor features that occur decades later in life.Here, we utilized the long-term prospective epidemiological data and enriched tumor molecular data in the Nurses' Health Studies (NHS), as well as validation using The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) databases, to investigate how age at menarche may be associated with tumor molecular characteristics and prognosis in women with breast cancer.

Study population
We used data from two large-scale prospective cohorts of registered female nurses in the United States, the NHS and NHSII.NHS (established in 1976) recruited 121,701 women between ages 30 and 55 years and NHSII (initiated in 1989) enrolled 116,429 women between ages 25 and 42 years.In both cohorts, participants completed an initial study questionnaire and subsequent questionnaires biennially afterward; cumulative follow-up rates were greater than 90% [7].As described previously [8], invasive breast cancer cases were identified initially by questionnaire (start of follow-up to 2012) or National Death Index search upon lack of participant response; breast cancer cases were confirmed by centralized medical record review using established protocols.No breast cancer cases included in our analysis had any prior personal history of cancer.Completion of the questionnaire was considered to imply informed consent upon study protocol approval by the Institutional Review Boards of the Brigham and Women's Hospital (Boston, MA) and Harvard T.H. Chan School of Public Health (Boston, MA) in 1976 (NHS) and 1989 (NHSII).NHSI/II were conducted in accordance with recognized ethical guidelines (Declaration of Helsinki).

Gene expression measurements
954 incident breast cancer cases within the study were eligible for transcriptomic analysis [10], which had participant characteristics similar to those without gene expression data.RNA was extracted from multiple cores of 1 or 1.5 mm procured from FFPE tumor (n = 1-3 cores) and matched normal adjacent tissue taken from > 1 cm from the tumor margin during surgery (n = 3-5 cores) and isolated using the Qiagen AllPrep RNA Isolation Kit.Transcriptome-wide gene expression was profiled using Affymetrix Glue Grant Human Transcriptome Array 3.0 (hGlue 3.0) and Human Transcriptome Array 2.0 (HTA 2.0) microarray chips.Normalization was performed using robust multiarray averages.Data was log-2-transformed and Affymetrix Power Tools probeset summarization-based metrics were used for quality control.After quality control and exclusion based on missing data, 846 tumor and 666 normal-adjacent tissues were used in this analysis.Gene expression data was deposited in Gene Expression Omnibus and is publicly available (GEO#; GPL22920, GSE93601, GSE115577).Of note, participant characteristics of breast cancer cases with and without gene expression were similar [11].The most variable probe was selected to represent the gene when genes were mapped by multiple probes.ComBat, which is an empirical Bayes method for batch effects, was used to control for technical variabilities [12].Genes with expression in the lowest quartile (< 25%) were excluded from the analyses.

Exposure and covariates
Age at menarche (age in years) and height (measured in feet and inches) were reported on the baseline study questionnaire.Weight at age 18 was reported during 1980 questionnaire cycle for NHSI and baseline study questionnaire for NHSII.Race was reported during 1992 questionnaire cycle for NHSI and baseline study questionnaire for NHSII.History of oral contraceptive use, menopausal status, parity, age at breast cancer diagnosis, calendar year of breast cancer diagnosis, weight and physical activity level at time of diagnosis were obtained via the biennial NHS and NHSII questionnaires.BMI was calculated by dividing the participant's weight in kilograms by their height in meters squared (kg/m 2 ).Tumor characteristics (stage and grade), treatment information (chemotherapy, radiotherapy, and endocrine therapy) were obtained from medical records or supplemental questionnaire.Estrogen receptor (ER) status was determined after central review of breast cancer tissue microarrays (TMAs), or pathology reports if missing.Based on previous literature, we defined early menarche as menstruation occurring before age 12, the median age at menarche in the U.S [9].Therefore, age at menarche, our primary exposure, was dichotomized and modeled as a categorical variable of "early" (< 12 years old) vs. "later" (≥ 12 years old).Our analyses were restricted to complete cases that included the following covariates, selected a priori: age at breast cancer diagnosis (continuous), year of diagnosis (continuous), tumor stage (1-4), chemotherapy (yes/no/unknown), radiation (yes/no/unknown), endocrine therapy (yes/no/unknown), oral contraceptive use (current user/past user/never user/unknown), race (White/non-White), parity (continuous), BMI at age 18 (continuous), BMI change (BMI at diagnosis -BMI at age 18), and physical activity at time of diagnosis (continuous).Within our analytic cohort with gene expression measurements, 52 cases were excluded from analysis due to missing information on age at menarche; missing BMI and physical activity data were imputed using the median.Secondary analyses were also performed modeling age at menarche as continuous.

Breast cancer recurrence
Breast cancer recurrences were determined as previously described [13].Briefly, supplemental questionnaires were sent to living cohort members with a previously confirmed diagnosis of breast cancer.Reports of new cancers of the liver, bone, brain, and lung-common sites of breast cancer metastasis-following their breast cancer diagnosis were considered breast cancer recurrences.Participants who died from breast cancer without previous report of recurrence were also presumed to have a breast cancer recurrence.The time scale of disease-free survival is thus defined as the time from initial diagnosis to either recurrence or end of follow-up, with participants who died of other causes censored at time of death.Deaths were most commonly reported by families, and deaths among nonrespondents were identified through the National Death Index, as is consistent in previous NHS analyses [14].Once a death was reported, the specific cause was subsequently determined by medical record review or death certificate.

Statistical analysis Age at menarche and gene expression
We evaluated the association between age at menarche with transcriptome-wide gene expression for each individual gene using covariate-adjusted linear regression (limma) [15].Each regression model was adjusted for confounders determined a priori and surrogate variables generated from the transcriptome data (the leek method from Bioconductor sva package in R) [16].Analyses were performed on tumor and tumor-adjacent tissues separately.Subgroup analyses stratified on ER-status based on the tumor.We used an FDR ≤ 0.1 to determine whether a gene meets transcriptome-wide significance [17].Functional enrichment of biological pathways associated with age of menarche was performed using Correlation Adjusted Mean Rank (CAMERA), a competitive gene set test [18] using an intergene correlation of 0.01.The same set of covariates used in the single gene analysis are controlled for here.We used the 50 cancer "hallmark" gene sets from the Molecular Signature Database (MSigDB; http:// www.broad insti tute.org/ gsea/ msigdb/) in our pathway analyses and an FDR ≤ 0.1 to determine statistical significance.For external validation, pathway analyses were replicated in a small subset of TCGA for which RNA-sequencing data and information on age at menarche was available (N = 116) [10,19].For this validation dataset, six TCGA sites were originally contacted and data from three (Roswell Park Cancer Institute, University of Pittsburgh, Mayo Clinic) sites that agreed to collect or provide breast cancer risk factor data on these cases were included in this study, as previously described [20].Covariates from TCGA were selected a priori to match those used in the NHS analysis as closely as possible, though there are some differences: age at breast cancer diagnosis (continuous), year of diagnosis (continuous), tumor stage (1)(2)(3)(4), race (White/non-White), parity (continuous), BMI at diagnosis (continuous), ER status (yes/no), and menopausal status (yes/no).Pathway analyses were again performed using CAMERA as described above and hits with FDR ≤ 0.2 were considered significant for validation.

Age at menarche and PAM50 scores
Proliferation score and risk of recurrence scores based on PAM50 assay were computed as described previously in NHS [21].Briefly, proliferation score is computed by averaging gene expression of 11 genes (BIRC5, CCNB1, CDC20, NUF2, CEP55, NDC80, MKI67, PTTG1, RRM2, TYMS and UBE2C) [22].Risk of recurrence (ROR) score combines gene expression of 50 gene in the PAM50 assay with tumor size and nodal status to compute an integer score proportional to risk of recurrence (0-100).PAM50 ROR score has been found to be highly predictive of risk of distant relapse [23].Multiple linear regression was performed using these scores as continuous dependent variables and age at menarche as the main predictor.PAM50 is frequently used to classify breast tumors based on their gene expression into five molecular subtypes that differ both in biological characteristics and prognosis [24].Multinomial regression was performed to test association of early menarche with PAM50-based intrinsic molecular subtypes (luminal B, normal, HER2, and basal) compared to the least aggressive luminal A subtype [24].All other covariates previously mentioned were used for adjustment.

Early menarche-derived gene expression signature and breast cancer recurrence
To examine associations between an early menarcheassociated gene expression signature and breast cancer recurrence, we included individual genes from FDR-significant pathways in the tumor (receptor-agnostic) showing nominal significance (p ≤ 0.05) to create a gene expression score, calculated as ∑(z-transformed genes from positively regulated pathways)-∑(z-transformed genes from negatively regulated pathways).LASSO regression was used in glmnet in R to select the most predictive genes while preventing overfitting through shrinkage of the regression coefficients [25].
Discovery cohort: Nurses' Health Studies We used Cox proportional hazards regression to examine the association between our menarche-associated gene expression signature and breast cancer recurrence among stage 1-3 breast cancer cases.Scores were modeled as categorical, using quartiles of expression as cut points to make scores from 1 to 4, with 1 representing the lowest score (most dissimilar to early menarche) and 4 representing the highest score (most similar to early menarche).Hazard ratios and 95% confidence intervals were calculated.Recurrence-free survival was defined as the time between cancer diagnosis and either recurrence (cancer detected at common metastatic sites, such as bone, brain, lungs, and liver) or death from breast cancer without reported recurrence (12).We evaluated an interaction term between score and the log of recurrence time to test violation of the proportional hazards assumption with a likelihood ratio test.Proportional hazards were violated; we therefore applied a piecewise Cox model using the crossing of the Kaplan Meier curves as our cut point, which was 10 years.
Validation cohort: METABRIC To assess the generalizability of our menarche-associated gene expression signature, we leveraged an independent dataset, the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), for validation.Using our NHS-derived gene signature, we computed the early menarche-associated score in tumors from METABRIC using the available gene expression data.We then used a Cox regression model to assess the association of the score with breast cancer recurrence.Covariates included: age at diagnosis, ER status, batch, menopausal status, cancer stage, and treatment (chemotherapy, radiotherapy, and/or hormonal therapy).Analysis was restricted to complete cases to include a total of 952 breast cancer cases, stage 1-3 only.Score was again modeled as categorical, using quartiles of expression in METABRIC as cut points.We evaluated an interaction term between score and the log of recurrence time to test violation of the proportional hazards assumption; no violation was found.

Participant characteristics
Of the 846 NHS women with tumor gene expression data, 206 had early menarche (< 12 years of age) (Table 1).Demographics were similar between women with early or later menarche, including age and year at breast cancer diagnosis and race, which is predominantly White (~ 95%) in the NHS cohort.Nearly 70% of women were postmenopausal at diagnosis in both the early and later menarche groups.Frequencies of breast cancer stage were similar between groups, consisting of mostly stage 1-3 tumors.Modest differences were observed in therapies used in disease management, with radiation therapy more commonly used among women with later age at menarche than early menarche (52.8% vs. 33.5%,respectively).Conversely, 51.0% of those with early menarche received chemotherapy compared to 43.3% of those with later menarche.Endocrine therapy and oral contraceptive use were similar between groups.Women in both groups had a median of two children and similar levels of physical activity.

Gene expression and pathway analysis in women who experienced early menarche
A full study workflow is presented in Fig. 1.We did not observe a statistically significant association between early menarche and any individual gene in tumor tissues.However, upon stratification by ER status, we identified 28 genes associated with early menarche in ER-positive tumors (FDR ≤ 0.1) (Supplemental Fig. 1, Table S1).Among these were genes associated with Notch and TGFß signaling pathways (DMXL2 and LRRC32, respectively), DNA damage (HUS1), cell stress and metastatic cell survival (ERLEC1), extracellular matrix (COL16A1), and proliferation (PIK3C3).However, no single genes were significant in ER-negative tumors.In the matched tumor-adjacent normal tissue, 369 individual genes were significantly associated with early menarche (FDR ≤ 0.1) (Table S2).Included among these were genes positively associated with extracellular matrix, cell adhesion, and invasion (e.g., RHOA, RAB1A, CTNND1, ITFG1, CAV1, CST3), cell cycle/proliferation (e.g., CDC16, CDC42, GSK3B, MAPKA5), and other genes with known roles in tissue transformation.Further stratification by estrogen receptor status found early menarche was significantly associated with 9 genes in normal tissues adjacent to ER+ tumors and 0 in normal tissues adjacent to ER− tumors.When comparing the 28 significant genes in ER+ tumor tissues and the 9 significant genes in ER+ adjacent normal tissues, 2 genes overlapped (HUS1 and PRKAG3).
In multivariable-adjusted competitive gene set enrichment analysis, 18 cancer hallmark pathways were significantly associated (FDR ≤ 0.1) with early menarche in tumor tissues; 23 pathways were significantly associated with early menarche in tumor-adjacent normal tissues (Table 2).Fifteen enriched pathways overlapped between tumor and normal tissues.In both tissue types, early menarche was associated with upregulation of pathways associated with proliferation (e.g.Myc, PI3K/AKT/ mTOR, cell cycle), oxidative stress (e.g.oxidative phosphorylation, unfolded protein response), and inflammation (e.g.pro-inflammatory cytokines IFNα and IFNγ ).Further, in both tissues, early menarche was associated with upregulation of adipogenesis.Normal tissues also showed enrichment of unique pathways like epithelial to mesenchymal transition (EMT) and angiogenesis, features indicative of a cancer-promoting microenvironment and tissue transformation, and downregulation of myogenesis.
Stratified by estrogen receptor status, 19 significant pathways were observed in ER-positive tumor tissue and 21 in matched ER-positive normal tissue, all of which mirrored the unstratified analysis with  upregulation of pathways involved in proliferation, cell stress, and cancer cell metabolism (Table S3); similar findings were observed in ER-negative tissues, with 9 pathways in tumor and 22 pathways in tumor-adjacent normal tissues detected with an FDR ≤ 0.1 (Table S4).
Analyses modeling age at menarche as continuous had very similar results (Table S5-S7), where we observed that increasing age at menarche was associated with downregulation of proliferation and cellular stress pathways.
We replicated the pathway analysis in TCGA (Table 3, Table S8).Although limited by sample size (N = 116), early menarche was associated with an upregulation of many of the same proliferation-related signaling pathways (MTORC1 signaling: FDR = 0.005; PI3K-AKT-MTOR signaling: FDR = 0.05), cellular stress pathways (Protein Secretion: FDR = 0.02), and inflammation pathways (Interferon Gamma Response: FDR = 0.11) that we observed in NHS.In addition, we observed the upregulation of several biologically related but unique pathways associated with early menarche, such as those involved in estrogen response (Estrogen Response Early: FDR = 0.05; Estrogen Response Late: FDR = 0.06) and others involved in inflammation and innate immune response (Allograft Rejection: FDR = 0.05; Complement: FDR = 0.06; Coagulation: FDR = 0.12).

N genes Direction P value FDR
Tumor (N = 156)

Early menarche-derived gene expression signature and risk of breast cancer recurrence in NHS and METABRIC
We next created an early menarche signature based on 28 genes selected from LASSO regularization regression and examined its association with 10-year diseasefree survival (Table 5).The majority of the 28 genes included within the signature were involved in three main biological processes: (1) cell stress response and metabolism (e.g.BAG1, HUS1, DRAM2); (2) cell proliferation, differentiation, and invasion (e.g. and (3) inflammation.In the NHS, individuals with higher early menarche-associated gene expression scores had an 18% increased risk of recurrence, though not statistically significant (N = 836, N events = 105, HR of highest vs. lowest score quartile = 1.18 [0.70-2.0],p-trend = 0.403, Fig. 2A).In METABRIC, after covariate adjustment, higher early menarche-associated gene expression score (based on the same 28 genes identified originally in NHS) was associated with a 58% higher risk of cancer recurrence (N = 952, N events = 310, HR for highest vs. lowest score quartile = 1.58 [1.10-2.25],p-trend = 0.02, Fig. 2B, Table S9).

Discussion
In this analysis of women with breast cancer in the NHS and NHSII, early menarche was strongly associated with cancer-promoting molecular changes in both tumor and normal-adjacent tissues, including, most notably, an enrichment of single genes and signaling pathways that drive cell proliferation and are commonly dysregulated in cancer.Similar gene expression differences were corroborated in a subset within TCGA with available data on age at menarche, with several additional pathways related to estrogen response also upregulated.Findings based on PAM50 metrics (molecular subtype, proliferation, risk of recurrence) were consistent with gene expression profiles that suggested association with more aggressive tumor disease.Our 28-gene early menarche-associated gene expression score was suggestively associated with worse survival in the NHS and significantly associated with worse survival in a larger external dataset, METABRIC.While previous work in the field has focused on understanding the relationship between menarche and breast cancer risk, our gene expression analyses offer insight connecting a classical early life breast cancer risk factor and associations with molecular changes within tumors occurring decades later.We identified 369 individual genes significantly associated with early menarche in tumor-adjacent normal tissues, many of which were associated with cell transformation, invasion, and tumorpromoting changes to the tissue microenvironment.Further investigation into these genes and whether they may Table 4 Early menarche is associated with more proliferative and aggressive tumors in breast cancer a Multinomial regression model with categorical dependent variables (PAM50 intrinsic molecular subtypes) b Linear regression model with continuous dependent variables (PAM50 proliferation and ROR scores) c Covariates: age at breast cancer diagnosis (continuous), year of diagnosis (continuous), tumor stage (1-4), chemotherapy (yes/no/unknown), radiation (yes/no/ unknown), endocrine therapy (yes/no/unknown), oral contraceptive use (current user/past user/never user/unknown), race (White/non-White), parity (continuous), BMI at 18 years of age (continuous), weight change (BMI at diagnosis -BMI at 18 years of age), and physical activity at time of diagnosis (continuous) Age at menarche was dichotomized and modeled as a categorical variable of "early" (< 12 years old) versus "normal" (≥ 12 years old) Score Range: PAM50 Proliferation Score Range: −1 to 1, PAM50 Risk of Recurrence Score Range: 0-100 hold any biological insights in the tumorigenic process in women that underwent menarche at an early age are warranted.As these normal-adjacent tissues more closely recapitulate the normal breast, it should also be explored whether these genes and the signaling pathways in which they are involved may represent changes that occur early in life.After stratification by ER status, 28 genes in ERpositive tumor tissues were significantly associated with early menarche; these genes ranged in their biological function, with common threads of cell adhesion, DNA damage, and metastatic cell survival.Robust associations linked early menarche with a host of signaling pathways that included proliferation, oxidative stress, cancer cell metabolism, DNA damage repair, and inflammation in both tumor and matched normal tissues in both ER-positive and ER-negative tissues.Of note, the similar pathway enrichment we observed in both ER+ and ER-tissues is concordant with other studies that have shown that early menarche increases risk equally among hormone receptor positive and hormone receptor negative breast cancer subtypes [1,2,[26][27][28].As it becomes increasingly apparent that the length of reproductive exposure to estrogens alone may not be the only factor underlying the heightened risk that accompanies early menarche, as was previously believed, more mechanistic work is needed in this area.
In addition to proliferative, metabolic, and stress pathways, we observed associations between early menarche and pathways related to the tissue microenvironment, which included downregulation of myogenesis.Tumorderived cytokines have been shown to impair myogenesis and alter the skeletal muscle immune microenvironment  [29]; indeed, we observed a positive association with proinflammatory cytokines and a negative association with myogenesis with early menarche.Normal adjacent tissues also showed an upregulation of epithelial-mesenchymal transition and angiogenesis, both of which promote cancer spread.Cancer-associated adipocytes are key players in breast cancer progression, undergoing metabolic reprogramming to support tumor cells through secretion of a variety of inflammatory and growth-promoting factors [30].Early menarche was associated with a significant upregulation of adipogenesis.Women with more adiposity during childhood tend to have earlier menarche, even though early life body fatness has been associated with reduced breast cancer risk [31].In the NHS, we recently showed that women with higher body fatness during childhood/ adolescence was associated with the downregulation of pathways involved in cell stress, proliferation, and inflammation in breast tumors.11 pathways identified for early life body fatness overlapped with our findings for early menarche, including 6 proliferation-related, 3 cell stressrelated, and 2 microenvironment-related pathways.Interestingly, all show distinctly inverse directionality, which is consistent with the known complex relationships between early life body size, menarche, and breast cancer risk.Though sample size was limited in TCGA, we observed trends and patterns in the tumor signaling pathway analyses similar to the NHS cohort.Four out of 10 of the significant pathways enriched in tumors from the TCGA directly overlapped with those identified within the NHS cohort, including upregulation of numerous cancer cell proliferation-, metabolism-, and inflammation-related pathways.In addition to those that directly replicated, tumors from women in the TCGA study population also exhibited a significant upregulation of distinct but related signaling pathways related to innate immune response, including complement, coagulation, and allograft rejection.Interestingly, early and late estrogen signaling was also upregulated in TCGA tumors from women with early menarche.As previously discussed, some have postulated that the increased breast cancer risk associated with early menarche may relate, in part, to higher levels and a longer exposure window to mitogenic estrogens.This hypothesis aligns with our data showing increased estrogen signaling in tumors was detected from women who underwent early menarche, even when adjusted for hormone receptor status.Some recent evidence also suggests that among women who experience menarche at an early age, those who have single nucleotide polymorphisms within genes involved in estrogen signaling are at higher risk of breast cancer than those who do not [32].Overall, our findings within TCGA closely mirror our data within the NHS cohort and support the conclusion that early menarche is associated with more proliferative and pro-tumorigenic breast tumor characteristics.While our study focuses on early menarche, which represents exposure to estrogen early in life, other reproductive risk factors can also act to increase estrogen levels throughout the life course (e.g.parity, breast-feeding, exogenous hormone use).Therefore, further investigation to understand whether they may impact tumor biology in a similar or distinct manner would be of interest.
The increased proliferation and other tumor-promoting pathways discovered in our gene expression analyses led us to hypothesize that tumors from women with early menarche may possess a more aggressive tumor phenotype.Leveraging PAM50 subtypes, we found early menarche was associated with basal-like and HER2enriched intrinsic molecular subtypes, which are considered more aggressive diseases with worsened prognosis [24], higher tumor proliferative index, and increased risk of recurrence score, which has been found to be highly predictive of risk of distant relapse, performing better than other methods of risk prediction [23], together bolstering our suspicions that these tumors may possess more aggressive molecular and pathological features.Concordantly, our NHS-derived gene expression signature that captures the molecular characteristics of tumors from women who experienced early menarche showed a significant, dose-response trend that was suggestively positively associated with cancer recurrence in NHS and significantly associated in METABRIC.This suggests our menarche-derived gene expression signature captured the molecular tumor features and associated prognostic impact in breast cancer patients who experienced early menarche.
In this study, we investigated the molecular features of breast tumors in relation to age at menarche, an established epidemiologic breast cancer risk factor.Early menarche was associated with more aggressive tumor molecular subtypes and characteristics, and our early menarche-derived gene expression signature was associated with a higher risk of breast cancer recurrence.Together, these results highlight how a common and increasingly prevalent early life exposure may influence the molecular and pathological phenotype of breast tumors later in life and how these changes relate to breast cancer prognosis.As the age of menarche onset continues to decline, better understanding of its influence on breast tumor biology and prognosis may lead to novel secondary prevention strategies in the future.

Fig. 1
Fig. 1 Study workflow of gene expression analyses

Fig. 2
Fig. 2 Association between early menarche and 10-year disease-free survival.Covariate−adjusted marginal survivor curves computed from Cox proportional hazards models for 10−year disease−free survival and early menarche gene expression score modelled as quartiles of expression, with 1 representing the lowest level of expression (the most dissimilar to early menarche) and 4 representing the highest level of expression (most representative of early menarche signature), in NHS ( A ) and METABRIC ( B ) cohorts.Covariates in NHS included age at breast cancer diagnosis (continuous), year of diagnosis (continuous), estrogen receptor status (yes/no), tumor stage (1-3), tumor grade (1-4), chemotherapy (yes/no/unknown), radiation (yes/no/unknown), endocrine therapy (yes/no/unknown), oral contraceptive use (current user/past user/never user/ unknown), race (White/non-White), parity (continuous), BMI at 18 years of age (continuous), weight change (BMI at diagnosis -BMI at 18 years of age), and physical activity at time of diagnosis (continuous).Covariates in METABRIC attempted to mirror NHS covariates as closely as possible, though not all were available; they included age at diagnosis, estrogen receptor status, batch, menopausal status, cancer stage, and treatment (chemotherapy, radiotherapy, and/or hormonal therapy)

Table 1
Participant characteristics of breast cancer cases in the NHS with tissue gene expression data (N = 846)

Table 2
Pathway enrichment analysis of age at menarche in breast tumors and normal-adjacent tissues

Table 5
Early menarche-derived gene expression signature